A stocastic process (i.e. a time series) is considered to be strictly stationary if the properties of the process are not changed by a shift in origin.
In the time series context this means that the joint distribution of \(\{y_{t_1}, \ldots, y_{t_n}\}\) must be identical to the distribution of \(\{y_{t_1+k}, \ldots, y_{t_n+k}\}\) for any value of \(n\) and \(k\).
Weakly Stationary
Strict stationary is unnecessarily strong / restrictive for many applications, so instead we often opt for weak stationary which requires the following,
The process must have finite variance / second moment \[E(y_t^2) < \infty \text{ for all $t$}\]
The mean of the process must be constant \[E(y_t) = \mu \text{ for all $t$}\]
The cross moment (covariance) may only depends on the lag (i.e. \(t-s\) for \(y_t\) and \(y_s\)) \[Cov(y_t,y_s) = Cov(y_{t+k},y_{s+k}) \text{ for all $t,s,k$}\]
When we say stationary in class we will almost always mean weakly stationary.
Autocorrelation
For a stationary time series, where \(E(y_t)=\mu\) and \(\text{Var}(y_t)=\sigma^2\) for all \(t\), we define the autocorrelation at lag \(k\) as
this can be written in terms of the autocovariance function (\(\gamma_k\)) as \[
\begin{aligned}
\gamma_k &= \gamma(t,t+k) = Cov(y_t, y_{t+k}) \\
\rho_k &= \frac{\gamma(t,t+k)}{\sqrt{\gamma(t,t) \gamma(t+k,t+k)}} = \frac{\gamma(k)}{\gamma(0)}
\end{aligned}
\]
Covariance Structure
Based on our definition of a (weakly) stationary process, it implies a covariance of the following structure,
Let \(y_t = y_{t-1} + w_t\) with \(y_0=0\) and \(w_t \sim N(0,1)\).
ACF + PACF
Stationary?
Is \(y_t\) stationary?
Partial Autocorrelation - pACF
Given these type of patterns in the autocorrelation we often want to examine the relationship between \(y_t\) and \(y_{t+k}\) with the (linear) dependence of \(y_t\) on \(y_{t+1}\) through \(y_{t+k-1}\) removed.
This is done through the calculation of a partial autocorrelation (\(\alpha(k)\)), which is defined as follows:
where \(P_{t,k}(y)\) is the projection of \(y\) onto the space spanned by \(y_{t+1},\ldots,y_{t+k-1}\).
pACF - Calculation
Let \(\rho(k)\) be the autocorrelation for the process at lag \(k\) then the partial autocorrelation at lag \(k\) will be \(\phi(k,k)\) given by the Durbin-Levinson algorithm,
This is an effort headed by Rob Hyndman (of forecast and fpp3 fame) and others to provide a consistent tidydata based framework for working with time series data and models.
Core packages:
tsibble - temporal data frames and related tools
fable - tidy forecasting / modeling
feasts - feature extraction and statistics
tsibbledata - sample tsibble data sets
tsibble
A tsibble is a tibble with additional infrastructure for encoding temporal data - specifically a tsibble is a tidy data frame with an index and key where
the index is the variable that describes the inherent ordering of the data (from past to present)
and the key is one or more variables that define the unit of observation over time
each observation should be uniquely identified by the index and key
global_economy
tsibbledata::global_economy
# A tsibble: 15,150 x 9 [1Y]
# Key: Country [263]
Country Code Year GDP Growth CPI Imports Exports Population
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Afghani… AFG 1960 5.38e8 NA NA 7.02 4.13 8996351
2 Afghani… AFG 1961 5.49e8 NA NA 8.10 4.45 9166764
3 Afghani… AFG 1962 5.47e8 NA NA 9.35 4.88 9345868
4 Afghani… AFG 1963 7.51e8 NA NA 16.9 9.17 9533954
5 Afghani… AFG 1964 8.00e8 NA NA 18.1 8.89 9731361
6 Afghani… AFG 1965 1.01e9 NA NA 21.4 11.3 9938414
7 Afghani… AFG 1966 1.40e9 NA NA 18.6 8.57 10152331
8 Afghani… AFG 1967 1.67e9 NA NA 14.2 6.77 10372630
9 Afghani… AFG 1968 1.37e9 NA NA 15.2 8.90 10604346
10 Afghani… AFG 1969 1.41e9 NA NA 15.0 10.1 10854428
# ℹ 15,140 more rows
# A tsibble: 64,532 x 5 [1M]
# Key: State, Industry [152]
State Industry `Series ID` Month Turnover
<chr> <chr> <chr> <mth> <dbl>
1 Australian Capital Territory Cafes, … A3349849A 1982 Apr 4.4
2 Australian Capital Territory Cafes, … A3349849A 1982 May 3.4
3 Australian Capital Territory Cafes, … A3349849A 1982 Jun 3.6
4 Australian Capital Territory Cafes, … A3349849A 1982 Jul 4
5 Australian Capital Territory Cafes, … A3349849A 1982 Aug 3.6
6 Australian Capital Territory Cafes, … A3349849A 1982 Sep 4.2
7 Australian Capital Territory Cafes, … A3349849A 1982 Oct 4.8
8 Australian Capital Territory Cafes, … A3349849A 1982 Nov 5.4
9 Australian Capital Territory Cafes, … A3349849A 1982 Dec 6.9
10 Australian Capital Territory Cafes, … A3349849A 1983 Jan 3.8
# ℹ 64,522 more rows
as_tsibble()
Existing ts objects or data frames can be converted to a tsibbles easily,
tsibble::as_tsibble(co2)
# A tsibble: 468 x 2 [1M]
index value
<mth> <dbl>
1 1959 Jan 315.
2 1959 Feb 316.
3 1959 Mar 316.
4 1959 Apr 318.
5 1959 May 318.
6 1959 Jun 318
7 1959 Jul 316.
8 1959 Aug 315.
9 1959 Sep 314.
10 1959 Oct 313.
# ℹ 458 more rows
# A tsibble: 468 x 2 [1M]
co2 t
<dbl> <mth>
1 315. 1970 Feb
2 316. 1970 Mar
3 316. 1970 Apr
4 318. 1970 May
5 318. 1970 Jun
6 318 1970 Jul
7 316. 1970 Aug
8 315. 1970 Sep
9 314. 1970 Oct
10 313. 1970 Nov
# ℹ 458 more rows
plotting tsibbles
As the tsibble is basically just a tibble which is just a data frame both base and ggplot plotting methods will work with tsibbles.
# A tsibble: 176 x 3 [1M]
# Key: key [1]
index key value
<mth> <chr> <int>
1 1980 Jan Total 15136
2 1980 Feb Total 16733
3 1980 Mar Total 20016
4 1980 Apr Total 17708
5 1980 May Total 18019
6 1980 Jun Total 19227
7 1980 Jul Total 22893
8 1980 Aug Total 23739
9 1980 Sep Total 21133
10 1980 Oct Total 22591
# ℹ 166 more rows